Efficient Feature Embeddings for Student Classification with Variational Auto-encoders

نویسندگان

  • Severin Klingler
  • Rafael Wampfler
  • Tanja Käser
  • Barbara Solenthaler
  • Markus Gross
چکیده

Gathering labeled data in educational data mining (EDM) is a time and cost intensive task. However, the amount of available training data directly influences the quality of predictive models. Unlabeled data, on the other hand, is readily available in high volumes from intelligent tutoring systems and massive open online courses. In this paper, we present a semi-supervised classification pipeline that makes effective use of this unlabeled data to significantly improve model quality. We employ deep variational auto-encoders to learn efficient feature embeddings that improve the performance for standard classifiers by up to 28% compared to completely supervised training. Further, we demonstrate on two independent data sets that our method outperforms previous methods for finding efficient feature embeddings and generalizes better to imbalanced data sets compared to expert features. Our method is data independent and classifier-agnostic, and hence provides the ability to improve performance on a variety of classification tasks in EDM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inducing Symbolic Rules from Entity Embeddings using Auto-encoders

Vector space embeddings can be used as a tool for learning semantic relationships from unstructured text documents. Among others, earlier work has shown how in a vector space of entities (e.g. different movies) fine-grained semantic relationships can be identified with directions (e.g. more violent than). In this paper, we use stacked denoising auto-encoders to obtain a sequence of entity embed...

متن کامل

Testing the limits of unsupervised learning for semantic similarity

Semantic Similarity between two sentences can be defined as a way to determine how related or unrelated two sentences are. The task of Semantic Similarity in terms of distributed representations can be thought to be generating sentence embeddings (dense vectors) which take both context and meaning of sentence in account. Such embeddings can be produced by multiple methods, in this paper we try ...

متن کامل

Scoring and Classifying with Gated Auto-Encoders

Auto-encoders are perhaps the best-known non-probabilistic methods for representation learning. They are conceptually simple and easy to train. Recent theoretical work has shed light on their ability to capture manifold structure, and drawn connections to density modeling. This has motivated researchers to seek ways of auto-encoder scoring, which has furthered their use in classification. Gated...

متن کامل

Conditional Autoencoders with Adversarial Information Factorization

Generative models, such as variational auto-encoders (VAE) and generative adversarial networks (GAN), have been immensely successful in approximating image statistics in computer vision. VAEs are useful for unsupervised feature learning, while GANs alleviate supervision by penalizing inaccurate samples using an adversarial game. In order to utilize benefits of these two approaches, we combine t...

متن کامل

Generative Adversarial Source Separation

Generative source separation methods such as non-negative matrix factorization (NMF) or auto-encoders, rely on the assumption of an output probability density. Generative Adversarial Networks (GANs) can learn data distributions without needing a parametric assumption on the output density. We show on a speech source separation experiment that, a multilayer perceptron trained with a Wasserstein-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017